Ch07 - Transactions
Chapter 7: Transactions
Purpose of transactions: simplify the programming model
ACID definition is slippery
- Atomicity:
- different from normal meaning in concurrency programming (nobody else sees a partially updated state)
- means more abortability - when failure, all partial updates are reverted
- Consistency:
- overloaded term
- means some invariants to be true - but it's more up to application.
- 'C' doesn't really belong
- Isolation:
- concurrently executing transactions are isolated from each other: they cannot step on each other’s toes
- i.e. serializability - result is the same as if they had run serially
- in practice, strictly serializable isolation is rarely used. Oracle 11g doesn’t even implement it.
- Durability
- once committed, data written is stored
- persisted to disk in single node
- or replicated in multi-node
- once committed, data written is stored
single-object vs multi
- Atomicity and isolation apply both to single and multi-object
- on single object:
- almost universally provided
- Atomicity - log for crash recovery
- isolation - a lock on each object
- some provide atomic operations e.g. increment or compare-and-et
- almost universally provided
- multi-object:
- in relational model, foreign key constraints to other table has to be met
- in document, need transactions to prevent denormalized data going out of sync
- update sconedary indexes
- on single object:
- handling error
- ACID databases abandon transactino entirely if at risk of ACID
- quorum storage - "best "effort"
Weak Isolation Levels
Serializable isolation has a performance cost, and many databases - including many popular relational DBs don’t want to pay that price.
Read Committed
- when reading, only see committed data - no dirty reads
- when writing, only overwrite data that has been fully committed - no dirty writes
- very popular - deulat in Oracle 11g, PostgreSQL, SQL Server 2012 etc.
- implementation
- prevent dirty writes by row level locks
- prevent dirty reads: not by read locks, but by keeping the old commited value and the new value set by the transaction that currently holds the write lock.
Snapshot Isolation and Repeatable Read
- reads might see effect of part of the transaction
- use case where it's important and useful: long-running, read-only queries
- backup
- Analytic queries and integrity checks
- Snapshot Isolation: each transaction reads from a consistent snapshot of the database
- Implementation
- prevent dirty writes by locks
- no read locks - instead, keep several different committed versions of an object - multi-version concurrency control (MVCC).
- always-increasing transaction ID
- when reading, only read the latest committed version of the row with write tx id < read tx id
- Indexes
- option 1: index simply point to all versions of an object, filter at read time
- option 2: (PostgreSQL) avoid index updates if diff version on the same page
- option 3: append-only B-trees - updates generate a new page. all parents up to root is copied and updated to new. a particular root is a consistent snapshot of the database at the point in time when it was created. require background compaction and garbage collection.
- naming confusion
- different DB call it differently, and actual guarantees are different
- because SQL standard has flawed defintion for repeatable read -